尽管知识蒸馏有经验成功,但仍然缺乏理论基础,可以自然地导致计算廉价的实现。为了解决这一问题,我们使用最近提出的熵函数来促进信息理论与知识蒸馏之间的替代联系。在这样做时,我们介绍了两个不同的互补损失,旨在最大限度地提高学生和教师陈述之间的相关性和互信。我们的方法对知识蒸馏和跨模型转移任务的最先进的竞争性能实现了最先进的,同时产生明显较低的培训开销,而不是密切相关和类似的方法。我们进一步展示了我们对二元蒸馏任务的方法的有效性,由此,我们将光线光到新的最先进的二进制量化。代码,评估协议和培训的型号将公开可用。
translated by 谷歌翻译
这项工作介绍了一种新颖的原则,我们通过机制稀疏正规调用解剖学,基于高级概念的动态往往稀疏的想法。我们提出了一种表示学习方法,可以通过同时学习与它们相关的潜在因子和稀疏因果图形模型来引起解剖学。我们开发了一个严谨的可识别性理论,建立在最近的非线性独立分量分析(ICA)结果中,结果是模拟这一原理,并展示了如何恢复潜在变量,如果一个规则大致潜在机制为稀疏,如果某些图形连接标准通过数据生成过程满足。作为我们框架的特殊情况,我们展示了如何利用未知目标的干预措施来解除潜在因子,从而借鉴ICA和因果关系之间的进一步联系。我们还提出了一种基于VAE的方法,其中通过二进制掩码来学习和正规化潜在机制,并通过表明它学会在模拟中的解散表示来验证我们的理论。
translated by 谷歌翻译
In recent years, reinforcement learning (RL) has become increasingly successful in its application to science and the process of scientific discovery in general. However, while RL algorithms learn to solve increasingly complex problems, interpreting the solutions they provide becomes ever more challenging. In this work, we gain insights into an RL agent's learned behavior through a post-hoc analysis based on sequence mining and clustering. Specifically, frequent and compact subroutines, used by the agent to solve a given task, are distilled as gadgets and then grouped by various metrics. This process of gadget discovery develops in three stages: First, we use an RL agent to generate data, then, we employ a mining algorithm to extract gadgets and finally, the obtained gadgets are grouped by a density-based clustering algorithm. We demonstrate our method by applying it to two quantum-inspired RL environments. First, we consider simulated quantum optics experiments for the design of high-dimensional multipartite entangled states where the algorithm finds gadgets that correspond to modern interferometer setups. Second, we consider a circuit-based quantum computing environment where the algorithm discovers various gadgets for quantum information processing, such as quantum teleportation. This approach for analyzing the policy of a learned agent is agent and environment agnostic and can yield interesting insights into any agent's policy.
translated by 谷歌翻译
Aliasing is a highly important concept in signal processing, as careful consideration of resolution changes is essential in ensuring transmission and processing quality of audio, image, and video. Despite this, up until recently aliasing has received very little consideration in Deep Learning, with all common architectures carelessly sub-sampling without considering aliasing effects. In this work, we investigate the hypothesis that the existence of adversarial perturbations is due in part to aliasing in neural networks. Our ultimate goal is to increase robustness against adversarial attacks using explainable, non-trained, structural changes only, derived from aliasing first principles. Our contributions are the following. First, we establish a sufficient condition for no aliasing for general image transformations. Next, we study sources of aliasing in common neural network layers, and derive simple modifications from first principles to eliminate or reduce it. Lastly, our experimental results show a solid link between anti-aliasing and adversarial attacks. Simply reducing aliasing already results in more robust classifiers, and combining anti-aliasing with robust training out-performs solo robust training on $L_2$ attacks with none or minimal losses in performance on $L_{\infty}$ attacks.
translated by 谷歌翻译
Deep learning models have shown promising results in recognizing depressive states using video-based facial expressions. While successful models typically leverage using 3D-CNNs or video distillation techniques, the different use of pretraining, data augmentation, preprocessing, and optimization techniques across experiments makes it difficult to make fair architectural comparisons. We propose instead to enhance two simple models based on ResNet-50 that use only static spatial information by using two specific face alignment methods and improved data augmentation, optimization, and scheduling techniques. Our extensive experiments on benchmark datasets obtain similar results to sophisticated spatio-temporal models for single streams, while the score-level fusion of two different streams outperforms state-of-the-art methods. Our findings suggest that specific modifications in the preprocessing and training process result in noticeable differences in the performance of the models and could hide the actual originally attributed to the use of different neural network architectures.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
互动主义模型引入了一种动态的语言,交流和认知方法。在这项工作中,我们在对话对话系统(SDS)的对话建模的背景下探讨了这一基本理论。为了扩展这样的理论框架,我们提出了一组设计原则,这些设计原则遵守中央心理语言和交流理论,以实现SDS中的互动主义。通过这些,关键思想可以构成我们提出的设计原则的基础。
translated by 谷歌翻译
近年来,变形金刚的体系结构在受欢迎程度上一直在越来越流行。调制检测变压器(MDETR)是一个端到端的多模式理解模型,该模型执行诸如相位接地,引用表达理解,参考表达分割和视觉问题答案之类的任务。该模型的一个了不起的方面是可以推断出以前未经培训的类别的能力。在这项工作中,我们探讨了MDETR在一项新任务中的使用,即动作检测,没有任何以前的培训。我们使用原子视觉动作数据集获得定量结果。尽管该模型没有报告任务中的最佳性能,但我们认为这是一个有趣的发现。我们表明,可以使用多模式模型来解决其设计不适合的任务。最后,我们认为,这一研究可能导致MDETR在其他下游任务中的概括。
translated by 谷歌翻译
嗜睡是驾驶员和交通事故主要原因之一的主要关注点。认知神经科学和计算机科学的进步已通过使用脑部计算机界面(BCIS)和机器学习(ML)来检测驾驶员的嗜睡。然而,几个挑战仍然开放,应该面对。首先,文献中缺少使用一组ML算法的多种ML算法对嗜睡检测性能的全面评估。最后,需要研究适合受试者组的可扩展ML模型的检测性能,并将其与文献中提出的单个模型进行比较。为了改善这些局限性,这项工作提出了一个智能框架,该框架采用了BCIS和基于脑电图(EEG)的功能,以检测驾驶场景中的嗜睡。 SEED-VIG数据集用于喂食不同的ML回归器和三类分类器,然后评估,分析和比较单个受试者和组的表现最佳模型。有关单个模型的更多详细信息,随机森林(RF)获得了78%的F1分数,改善了通过文献中使用的模型(例如支持向量机(SVM))获得的58%。关于可扩展模型,RF达到了79%的F1得分,证明了这些方法的有效性。所学的经验教训可以总结如下:i)不仅SVM,而且文献中未充分探索的其他模型与嗜睡检测有关,ii)ii)适用于受试者组的可伸缩方法也有效地检测嗜睡,即使新受试者也是如此评估模型培训中未包括的。
translated by 谷歌翻译
长期以来,部署能够探索未知环境的自动驾驶机器人一直是与机器人社区有很大相关性的话题。在这项工作中,我们通过展示一个开源的活动视觉猛烈框架来朝着这个方向迈出一步基础姿势图提供的结构。通过仔细估计后验加权姿势图,在线实现了D-最佳决策,目的是在发生探索时改善本地化和映射不确定性。
translated by 谷歌翻译